Skip to content

[fix](uniform function) fix constant argument handling and use ColumnView #63076

Merged
Mryange merged 2 commits into
apache:masterfrom
Mryange:fix-uniform
May 20, 2026
Merged

[fix](uniform function) fix constant argument handling and use ColumnView #63076
Mryange merged 2 commits into
apache:masterfrom
Mryange:fix-uniform

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 8, 2026

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

The uniform function takes three arguments: min, max, and seed. Only the first two (min, max) are truly "always constant" — the seed column should be treated as a
regular column, not a constant. Without overriding get_arguments_that_are_always_constant(), when a user passes a constant value as the third argument (seed), the
default framework logic treats it as a constant column, leading to incorrect results.

Root cause: the base class default get_arguments_that_are_always_constant() does not distinguish between the seed argument and the min/max arguments, so a constant
seed would be folded by the constant-handling path rather than being treated as a per-row value.

Fix:

  • Override get_arguments_that_are_always_constant() to return {0, 1}, explicitly marking only min and max as always-constant arguments.
  • Refactor seed column access to use ColumnView<TYPE_BIGINT> for safer and more idiomatic typed column iteration.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed PR #63076 against the Doris code-review checklist. I found one blocking correctness issue in the BE scalar function constant-argument path.

Critical checkpoints:

  • Goal/test: The PR tries to fix uniform constant argument handling, but the all-constant execution path can still execute with inconsistent temporary column sizes and is not proven by a regression test.
  • Scope: The code change is small, but it changes framework metadata with non-local effects in PreparedFunctionImpl::default_implementation_for_constant_arguments.
  • Concurrency/lifecycle/config/compatibility/persistence/data writes: Not applicable; this is a stateless BE scalar function change with no new config, storage format, transaction, or protocol changes.
  • Parallel paths: Nereids rejects literal seed arguments, but BE/classic execution and BE constant-fold execution still need to be safe because this BE implementation is the final execution boundary.
  • Conditional checks/error handling: Existing min/max validation is preserved, but the seed literal/non-constant invariant is still not enforced in BE.
  • Tests: Existing regression tests cover Nereids analysis errors, but there is no added test covering the BE all-constant framework path affected by this override.
  • Observability/performance: No new observability needed; performance impact is negligible relative to per-row RNG.

User focus: No additional user-provided review focus was specified.

@@ -157,6 +159,8 @@ class FunctionUniform : public IFunction {
return Impl::get_variadic_argument_types();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override makes the default all-constant path keep min/max as ColumnConst but unwrap the seed argument to its nested data column. In PreparedFunctionImpl::default_implementation_for_constant_arguments, temporary_block.rows() is then taken from the first column, so for a row-producing query such as a classic-planner path that reaches BE with uniform(1, 100, 1), the temp block has min as ColumnConst(size = input_rows_count) and seed as ColumnInt64(size = 1). execute_impl loops input_rows_count rows and ColumnView treats the unwrapped seed as non-const, so value_at(1) and later read past the one-row seed column.

Nereids currently rejects a literal seed, but this BE function is also the execution boundary for non-Nereids/classic and BE constant-fold paths, and this PR specifically changes BE constant handling. Please either enforce the third argument must not be constant at BE execution/open time or disable/adjust the default constant-argument path so the seed remains a correctly sized ColumnConst rather than a one-row nested column.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override makes the default all-constant path keep min/max as ColumnConst but unwrap the seed argument to its nested data column. In PreparedFunctionImpl::default_implementation_for_constant_arguments, temporary_block.rows() is then taken from the first column, so for a row-producing query such as a classic-planner path that reaches BE with uniform(1, 100, 1), the temp block has min as ColumnConst(size = input_rows_count) and seed as ColumnInt64(size = 1). execute_impl loops input_rows_count rows and ColumnView treats the unwrapped seed as non-const, so value_at(1) and later read past the one-row seed column.

Nereids currently rejects a literal seed, but this BE function is also the execution boundary for non-Nereids/classic and BE constant-fold paths, and this PR specifically changes BE constant handling. Please either enforce the third argument must not be constant at BE execution/open time or disable/adjust the default constant-argument path so the seed remains a correctly sized ColumnConst rather than a one-row nested column.

某个设计上的问题,目前的be的执行框架,在执行的时候,一个函数/表达式 返回会返回column const,或者非column const (有一些函数会根据runtime的value来直接返回一个column const 的column

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

run beut

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (27767/37643)
Line Coverage 57.63% (300623/521658)
Region Coverage 54.85% (250522/456715)
Branch Coverage 56.39% (108384/192189)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 18, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.49% (20640/38588)
Line Coverage 37.15% (195052/525079)
Region Coverage 33.49% (152529/455411)
Branch Coverage 34.55% (66552/192622)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31460 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f1d650520a1823dffe687a4dab8e88157e4d8e84, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17613	3977	3973	3973
q2	q3	10753	1459	830	830
q4	4682	482	354	354
q5	7596	2266	2098	2098
q6	243	178	141	141
q7	973	784	642	642
q8	9356	1909	1684	1684
q9	5569	4964	4930	4930
q10	6454	2128	1820	1820
q11	441	273	244	244
q12	700	427	305	305
q13	18269	3410	2806	2806
q14	266	268	240	240
q15	q16	823	777	704	704
q17	999	987	976	976
q18	6913	5688	5613	5613
q19	1177	1353	1118	1118
q20	528	421	259	259
q21	5712	2609	2423	2423
q22	438	363	300	300
Total cold run time: 99505 ms
Total hot run time: 31460 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4318	4237	4288	4237
q2	q3	4474	4965	4301	4301
q4	2130	2262	1399	1399
q5	4481	4376	4357	4357
q6	256	304	218	218
q7	2138	2074	1740	1740
q8	2676	2289	2272	2272
q9	8081	7740	7779	7740
q10	4552	4489	4113	4113
q11	637	427	393	393
q12	736	760	552	552
q13	3350	3705	2969	2969
q14	315	303	287	287
q15	q16	753	750	652	652
q17	1453	1346	1553	1346
q18	7883	7346	7283	7283
q19	1214	1146	1133	1133
q20	2222	2230	1943	1943
q21	5472	4759	4566	4566
q22	520	464	417	417
Total cold run time: 57661 ms
Total hot run time: 51918 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f1d650520a1823dffe687a4dab8e88157e4d8e84, data reload: false

query5	4319	660	518	518
query6	327	214	204	204
query7	4270	545	302	302
query8	346	242	228	228
query9	8866	4035	4056	4035
query10	452	348	306	306
query11	5808	2363	2196	2196
query12	191	131	125	125
query13	1276	617	419	419
query14	6058	5387	5074	5074
query14_1	4412	4381	4335	4335
query15	217	208	189	189
query16	1037	465	402	402
query17	1168	739	611	611
query18	2766	489	368	368
query19	229	214	174	174
query20	142	139	143	139
query21	219	143	125	125
query22	13575	13466	13432	13432
query23	17252	16371	16018	16018
query23_1	16101	16156	16105	16105
query24	7579	1737	1298	1298
query24_1	1305	1301	1332	1301
query25	599	504	446	446
query26	1320	318	179	179
query27	2668	571	334	334
query28	4458	1994	1941	1941
query29	1040	655	533	533
query30	317	233	201	201
query31	1117	1066	949	949
query32	89	76	78	76
query33	548	371	304	304
query34	1172	1153	648	648
query35	758	828	668	668
query36	1316	1321	1224	1224
query37	151	105	93	93
query38	3191	3134	3059	3059
query39	928	936	899	899
query39_1	880	879	869	869
query40	225	152	126	126
query41	64	62	65	62
query42	110	108	108	108
query43	318	334	287	287
query44	
query45	208	201	191	191
query46	1059	1157	717	717
query47	2324	2357	2245	2245
query48	397	413	288	288
query49	631	499	379	379
query50	1025	356	254	254
query51	4260	4319	4208	4208
query52	105	106	94	94
query53	252	298	201	201
query54	306	267	265	265
query55	95	93	85	85
query56	306	299	291	291
query57	1408	1392	1326	1326
query58	295	265	273	265
query59	1492	1608	1407	1407
query60	319	315	295	295
query61	156	158	155	155
query62	666	627	575	575
query63	247	197	206	197
query64	2368	818	649	649
query65	
query66	1698	470	350	350
query67	29930	29322	29145	29145
query68	
query69	463	338	314	314
query70	1000	987	976	976
query71	303	271	272	271
query72	2964	2730	2381	2381
query73	834	776	408	408
query74	5080	4882	4768	4768
query75	2670	2575	2267	2267
query76	2307	1177	748	748
query77	397	406	322	322
query78	12294	12170	11695	11695
query79	1494	1024	744	744
query80	633	544	440	440
query81	449	280	250	250
query82	1368	164	123	123
query83	349	269	250	250
query84	263	142	113	113
query85	882	537	453	453
query86	390	311	291	291
query87	3409	3401	3190	3190
query88	3637	2653	2641	2641
query89	434	389	332	332
query90	1959	185	186	185
query91	179	166	140	140
query92	81	72	79	72
query93	1521	1447	848	848
query94	546	377	316	316
query95	683	469	347	347
query96	1051	787	345	345
query97	2688	2703	2555	2555
query98	240	231	242	231
query99	1157	1073	973	973
Total cold run time: 253407 ms
Total hot run time: 168848 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.63% (27824/37788)
Line Coverage 57.58% (301570/523702)
Region Coverage 54.75% (251761/459826)
Branch Coverage 56.34% (108927/193352)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 19, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 30774 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4eed6bb068569d4d8f2db2d7cd22f1a065b39080, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17591	3903	3874	3874
q2	q3	10760	1344	791	791
q4	4682	466	347	347
q5	7592	2277	2108	2108
q6	360	176	137	137
q7	996	754	643	643
q8	9619	1789	1644	1644
q9	7075	4882	4894	4882
q10	6439	2135	1789	1789
q11	435	276	249	249
q12	692	421	292	292
q13	18191	3394	2736	2736
q14	268	254	234	234
q15	q16	819	765	708	708
q17	907	944	915	915
q18	6803	5731	5440	5440
q19	1188	1342	1060	1060
q20	518	424	272	272
q21	5649	2601	2352	2352
q22	433	358	301	301
Total cold run time: 101017 ms
Total hot run time: 30774 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4208	4129	4164	4129
q2	q3	4470	4925	4262	4262
q4	2078	2223	1419	1419
q5	4407	4276	4267	4267
q6	225	177	125	125
q7	2353	1801	1585	1585
q8	2570	2096	2044	2044
q9	7781	7834	7773	7773
q10	4538	4509	4122	4122
q11	611	444	495	444
q12	757	759	528	528
q13	3430	3766	3030	3030
q14	288	319	275	275
q15	q16	723	758	652	652
q17	1360	1314	1329	1314
q18	7854	7360	6949	6949
q19	1142	1135	1114	1114
q20	2212	2201	1939	1939
q21	5345	4622	4509	4509
q22	526	471	408	408
Total cold run time: 56878 ms
Total hot run time: 50888 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169726 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4eed6bb068569d4d8f2db2d7cd22f1a065b39080, data reload: false

query5	4315	652	524	524
query6	347	221	214	214
query7	4280	519	291	291
query8	319	233	218	218
query9	8830	4000	4012	4000
query10	460	355	296	296
query11	5789	2405	2211	2211
query12	190	133	127	127
query13	1277	595	432	432
query14	6067	5376	5061	5061
query14_1	4364	4388	4380	4380
query15	217	206	184	184
query16	1026	438	449	438
query17	1166	738	605	605
query18	2525	507	356	356
query19	215	207	170	170
query20	138	131	138	131
query21	217	138	119	119
query22	13554	13548	13316	13316
query23	17244	16405	15953	15953
query23_1	16184	16252	16121	16121
query24	7549	1763	1283	1283
query24_1	1309	1308	1296	1296
query25	591	497	441	441
query26	1180	317	171	171
query27	2712	576	347	347
query28	4459	1978	1916	1916
query29	972	606	504	504
query30	300	232	198	198
query31	1108	1062	925	925
query32	87	75	74	74
query33	559	350	309	309
query34	1151	1152	637	637
query35	758	793	678	678
query36	1329	1324	1142	1142
query37	162	105	88	88
query38	3203	3166	3064	3064
query39	935	920	918	918
query39_1	852	898	887	887
query40	228	148	132	132
query41	64	63	63	63
query42	110	108	111	108
query43	321	335	283	283
query44	
query45	217	203	196	196
query46	1062	1186	700	700
query47	2298	2356	2148	2148
query48	413	396	287	287
query49	629	482	381	381
query50	997	362	254	254
query51	4317	4447	4235	4235
query52	105	106	95	95
query53	260	278	199	199
query54	306	280	261	261
query55	94	95	86	86
query56	293	299	305	299
query57	1443	1401	1325	1325
query58	291	274	290	274
query59	1562	1650	1377	1377
query60	344	350	316	316
query61	157	150	154	150
query62	680	626	545	545
query63	252	199	203	199
query64	2215	807	649	649
query65	
query66	1647	474	358	358
query67	29853	29809	29852	29809
query68	
query69	453	343	296	296
query70	1023	975	998	975
query71	315	284	276	276
query72	2980	2732	2421	2421
query73	811	784	435	435
query74	5092	4895	4711	4711
query75	2669	2587	2258	2258
query76	2277	1120	791	791
query77	398	405	326	326
query78	12094	12163	11631	11631
query79	1464	1036	751	751
query80	1277	559	460	460
query81	504	279	238	238
query82	1242	162	121	121
query83	344	275	249	249
query84	255	137	112	112
query85	927	539	456	456
query86	446	342	353	342
query87	3418	3410	3175	3175
query88	3498	2677	2673	2673
query89	455	380	341	341
query90	1885	181	182	181
query91	176	165	139	139
query92	79	79	76	76
query93	1529	1461	906	906
query94	731	353	300	300
query95	654	392	428	392
query96	1057	795	331	331
query97	2697	2680	2542	2542
query98	241	232	235	232
query99	1112	1102	984	984
Total cold run time: 253132 ms
Total hot run time: 169726 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.56% (20733/38713)
Line Coverage 37.19% (196046/527099)
Region Coverage 33.52% (153590/458267)
Branch Coverage 34.57% (66951/193677)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 19, 2026

run cloud_p0

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.39% (24036/37915)
Line Coverage 47.14% (247857/525746)
Region Coverage 44.20% (204525/462688)
Branch Coverage 45.41% (88270/194404)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.39% (24036/37915)
Line Coverage 47.14% (247857/525746)
Region Coverage 44.20% (204525/462688)
Branch Coverage 45.41% (88270/194404)

Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 20, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completed the review against the full PR diff and the existing inline review context. I am requesting changes because the already-open thread on be/src/exprs/function/uniform.cpp remains a blocking correctness issue: the new {0, 1} get_arguments_that_are_always_constant() override still lets the default all-constant path unwrap a constant seed into a one-row nested column while execute_impl() iterates input_rows_count rows. I did not add another inline comment for that because it has already been raised in the existing review thread.

Critical checkpoint conclusions:

  • Goal/test: the goal is to make uniform handle constant-argument behavior correctly while treating only min/max as always constant. The current code does not fully accomplish this for all-constant BE execution paths; the added regression only checks the double non-constant-seed row count and does not prove the problematic all-constant BE path is safe.
  • Scope/clarity: the code change is small and focused, but it relies on default constant handling in a way that is still unsafe for the seed column.
  • Concurrency/lifecycle/config/compatibility: no new concurrency, special lifecycle, config, storage format, or FE-BE protocol compatibility concern found.
  • Parallel paths: Nereids rejects literal seed, but BE execution/constant-fold/classic paths are parallel entry points and still need safe behavior at the BE boundary.
  • Conditions/error handling: min/max validation is preserved; seed literal handling is not enforced at BE execution/open time.
  • Test coverage: coverage is insufficient for the BE constant-argument failure mode; the new test only validates successful double execution count with random()*10000.
  • Observability/performance/transaction/persistence: no additional issues found.

User focus: no additional user-provided review focus was specified.

@Mryange Mryange merged commit a70c212 into apache:master May 20, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants